-
-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tool to downsample a BAM while retaining reads in low coverage areas. #893
Conversation
Codecov ReportBase: 95.66% // Head: 95.65% // Decreases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## main #893 +/- ##
==========================================
- Coverage 95.66% 95.65% -0.01%
==========================================
Files 125 126 +1
Lines 7239 7294 +55
Branches 507 487 -20
==========================================
+ Hits 6925 6977 +52
- Misses 314 317 +3
Flags with carried forward coverage won't be shown. Click here to find out more.
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few comments, one potential bug.
|
||
/** Returns the coverage at the given genomic position, or -1 if the position is not between start:end. */ | ||
def apply(i: Int): Int = { | ||
if (i < start || i > end) -1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider either returning 0
(zero coverage) or None
(no coverage) instead of -1
, which feels very Java-like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think actually I can just go back to having it fail if the index is out of bounds.
*/ | ||
def add(t: Template): Boolean = { | ||
val recs = t.allReads | ||
.filterNot(_.secondary) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This made me think about supplementary reads. Should those be upgraded to "primary" alignments or be filtered out as well? I'm thinking of a region where we have a large number of both primary and supplementary alignments, and how to pick there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I definitely want to include supplementary reads, they're just part of a split-read alignment and therefore contribute to coverage. E.g. imagine an alignment to a circular genome where anything that maps over the break point will have a primary+supplementary, we want to count both of those.
.sortBy(r => (r.refName, r.start, r.end)) | ||
.iterator | ||
.map(r => new Interval(r.refName, r.start, r.end)) | ||
new IntervalMergerIterator(iter, true, false, false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sad we can't call arguments by name, since what are the three booleans here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. For reference they are:
- combineAbutting
- enforceSameStrand
- concatenateNames
src/main/scala/com/fulcrumgenomics/bam/DownsampleAndNormalizeBam.scala
Outdated
Show resolved
Hide resolved
.toSeq | ||
|
||
val addsCoverage = recs.exists { rec => | ||
detector.getOverlaps(rec.asSam).iterator().exists { cov => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps it's time to have getOverlaps
return an iterator itself?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think perhaps it's time to move on from HTSJDK's OverlapDetector and implement one in scala that suits our needs better :/ But not today.
Still needs tests.